Resolving Translation Ambiguity Using Non-Parallel Bilingual Corpora
نویسنده
چکیده
This paper presents an unsupervised method for choosing the correct translation of a word in context. It learns disambiguation information from nonparallel bilinguM corpora (preferably in the same domain) free from tagging. Our method combines two existing unsupervised disambiguation algorithms: a word sense disambiguation algorithm based on distributional clustering and a translation disambiguation algorithm using target language corpora. For the given word in context, the former algorithm identifies its meaning as one of a number of predefined usage classes derived by clustering a large amount of usages in the source language corpus. The latter algorithm is responsible for associating each usage class (i.e., cluster) with a target word that is most relevant to the usage. This paper also shows preliminary results of translation experiments.
منابع مشابه
Disambiguation of English PP Attachment using Multilingual Aligned Data
Prepositional phrase attachment (PP attachment) is a major source of ambiguity in English. It poses a substantial challenge to Machine Translation (MT) between English and languages that are not characterized by PP attachment ambiguity. In this paper we present an unsupervised, bilingual, corpus-based approach to the resolution of English PP attachment ambiguity. As data we use aligned linguist...
متن کاملA Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora
We present two problems for statistically extracting bilingual lexicon: (1) How can noisy parallel corpora be used? (2) How can non-parallel yet comparable corpora be used? We describe our own work and contribution in relaxing the constraint of using only clean parallel corpora. DKvec is a method for extracting bilingual lexicons, from noisy parallel corpora based on arrival distances of words ...
متن کاملA Combination of Models for Bilingual Lexicon Extraction from Comparable Corpora
In this paper we present a method to extract bilingual terminologies from comparable non-aligned corpora, by using multiple linguistic knowledge sources, such as: non-parallel corpora, bilingual thesauri, a preliminary bilingual dictionary, etc... We focus on two core technologies: bilingual lexicon extraction from comparable corpora and expansion through thesauri categories based on different ...
متن کاملAutomatic Construction of a Japanese-Chinese Dictionary via English
This paper proposes a method of constructing a dictionary for a pair of languages from bilingual dictionaries between each of the languages and a third language. Such a method would be useful for language pairs for which wide-coverage bilingual dictionaries are not available, but it suffers from spurious translations caused by the ambiguity of intermediary third-language words. To eliminate spu...
متن کاملAutomatic Thesaurus Generation through Multiple Filtering
11, this paper, we propose a method of gen(',rating bilingual keyword eh.lsters or thesauri from parallel or comi.m, able bilingual corpora. The method combines nmrphological and lexical processing, bilingual word aligmnent, and graph-theoretic cluster generation. An experiment shows that the method is promising. 1 I n t r o d u c t i o n In this paper, we propose a method of automatte bilingua...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999